A hybrid demultiplexing strategy that improves performance and robustness of cell hashing

Abstract Cell hashing, a nucleotide barcode-based method that allows users to pool multiple samples and demultiplex in downstream analysis, has gained widespread popularity in single-cell sequencing due to its compatibility, simplicity, and cost-effectiveness. Despite these advantages, the performance of this method remains unsatisfactory under certain circumstances, especially in experiments that have imbalanced sample sizes or use many hashtag antibodies. Here, we introduce a hybrid demultiplexing strategy that increases accuracy and cell recovery in multi-sample single-cell experiments. This approach correlates the results of cell hashing and genetic variant clustering, enabling precise and efficient cell identity determination without additional experimental costs or efforts. In addition, we developed HTOreader, a demultiplexing tool for cell hashing that improves the accuracy of cut-off calling by avoiding the dominance of negative signals in experiments with many hashtags or imbalanced sample sizes. When compared to existing methods using real-world datasets, this hybrid approach and HTOreader consistently generate reliable results with increased accuracy and cell recovery.


Introduction
Recent technical advances in single-cell sequencing have greatly benefited biological research by enhancing our ability to investigate cellular mechanisms of homeostasis and disease in a more precise, high-resolution, and multi-omic fashion [1][2][3].In the past decade, numerous single-cell methods have been proposed to improve the quality, magnitude, modality, and economy of singlecell experimental approaches [1,[4][5][6][7][8][9][10].Among them, sample pooling and demultiplexing can greatly reduce the per-cell cost and have therefore been extensively used.
Sample multiplexing approaches in single-cell sequencing were first proposed in 2015 [11]; since that time, many variants of this approach have been developed [12].For single-cell RNAseq, current demultiplexing methods fall into several major technical solutions, including nucleotide barcode-based methods (e.g.cell hashing, MULTI-seq, ConA-based sample-barcoding strategy, BART-Seq, and ClickTags), natural genetic variationbased methods (e.g.demuxlet, scSplit, Vireo, and Souporcell), and vector-based barcoding methods [8][9][10][13][14][15][16][17][18][19].Among them, cell hashing has been one of the most popular approaches due to its ease of compatibility with existing experimental and data processing pipelines [20].Specifically, hashing involves staining cells from each sample with uniquely barcoded antibodies [also called hashtags, or hashtag oligonucleotides (HTOs)] that recognize ubiquitously expressed surface antigens, such as CD298 and β2-microglobulin [21].The HTO-labeled cells can then be pooled and prepared for single-cell sequencing, and the sample identities of each cell can be determined based on the relative enrichment level of each hashtag using computational approaches.
Despite the widespread usage of cell hashing, certain issues exist due to the nature of this technique.Ideally, cells from a hashing experiment should be divided into two groups: negatives (no signal to any hashtag) and singlet positives (positive signal to one hashtag).However, interpreting the results of real-world datasets can be more complex and present additional challenges.First, a background signal may be present for a given hashtag, producing false singlets (unlabeled cells categorized into a random singlet).Furthermore, cross-contamination caused by hashtag spill-over during sample pooling often leads to the presence of false doublets (single cells positive for multiple hashtags), whose frequency increases with the number of hashtags being used.Finally, improper staining conditions can occasionally cause hashtag inefficiency, resulting in a significant number of negative cells (cells with no hashtag signal).These problems are common among real-world datasets and difficult to avoid experimentally.
To demultiplex cell hashing experiments, numerous computational methods have been developed.Among them, HTOdemux (Seurat), Multi-seq, GMM_Demux, BFF, and DropletUtils have been shown to work well with benchmark datasets and therefore gained widespread usage [13,[22][23][24][25].Among these methods, HTOdemux groups cells based on their cell hashing profiles using k-medoids clustering, while MULTI_seq employs a similar heuristic approach that defines positives and negatives based on local maxima of the probability density of central-log-transformed cell hashing counts.DropletUtils assumes a specified difference (logfold change) between negative and positive cell hashing profiles, GMM-Demux decomposes centered log-ratio (CLR)-transformed HTO counts into two Gaussian distributions representing positive and negative groups and BFF, the most recent method, classifies cells in the Bimodal Quantile Normalization (BQN) space developed alongside BFF models.Despite using different mathematical models, these methods all function by determining the appropriate cutoff for each hashtag based on the bimodal distribution of hashtag counts (or normalized counts) [24].As a result, these methods can perform poorly in real-world datasets when actual distributions differ significantly from the assumptions made by the models.For example, HTOdemux, GMM-Demux, Multi-seq, and DropletUtils have been reported to perform poorly on datasets that have insufficient distinction between two peaks (even multiple peaks) of hashtag distributions [23,24].Together, the performance of current demultiplexing methods for cell hashing can vary significantly among different real-world datasets.
Recently, multiple genetic variation-based demultiplexing methods have been developed that determine donor identity by analyzing natural single nucleotide polymorphisms (SNPs) within sequenced DNA or RNA.Here, we propose a hybrid strategy that improves the overall performance of existing approaches by integrating the results of SNP-based demultiplexing with cell hashing.This approach is further complemented by HTOreader, a novel tool that improves the cutoff calling of individual hashtag distributions in real-world datasets that have imbalanced sample sizes or use many hashtags.For both benchmark and real-world datasets, our hybrid strategy significantly increases accuracy and cell recovery rate when compared to existing demultiplexing approaches for cell hashing alone.Significant benefits are also observed in datasets that have a low recovery rate due to poor hashtag staining quality.In applying our hybrid approach to two datasets from the same group of human donors, one stained with hashtag and one without, we find that performance remains consistent even among unstained cells; this indicates that reagent costs can be substantially decreased by only hashing a small number of cells from each sample.Furthermore, by integrating results from two methods based on completely different mechanisms (hashing and SNPs), this hybrid approach is inherently self-validating; potential errors that arise from one method can be easily identified by the other, making hybrid demultiplexing more reliable than other unimodal methods.Finally, rather than proof of concept, we demonstrate the workf low and performance of this strategy in a real-world B cell study.Together, our hybrid strategy provides a robust, accurate, and cost-effective solution for demultiplexing singlecell RNA sequencing datasets.

Stoeckius-2018
A single-cell dataset from human peripheral blood mononuclear cells has been previously published [8].This dataset is comprised of eight individual donors that are uniquely labeled by eight cell hashtags.This dataset can be accessed from Gene Expression Omnibus (GEO) repository under accession number GSE108313.

3V007
A novel single-cell dataset was generated for this paper.In this dataset, mRNA, B cell receptor (BCR) repertoire, surface protein expression (CD27 and CD79b), and antigen-CD79b ratios of 14 antigen-probes, including HA proteins of several endemic inf luenza strains, were measured.Two groups of cells (from the same donor), antigen-specific B cells (inf luenza HA-specific), and carrier cells (T cells and B cells) were uniquely labeled using two cell hashtags, hashtag1 and hashtag2, respectively.This dataset has been uploaded to GEO (see the data availability statement for details).

S414
A novel single-cell dataset was generated for this paper.In this dataset, mRNA, BCR repertoire, surface protein expression (CD27 and CD79b), and predicted binding specificity against eight antigen probes, including HA proteins of several endemic inf luenza strains, were measured.Four groups of cells (from the same donor) were uniquely labeled by four cell hashtags, hashtag3, hashtag4, hashtag5, and hashtag6.We split the carrier cells (T cells with a few B cells) into two groups, labeled them with hashtag3 and hashtag4; and split the antigen-specific B cells into two groups, labeling them with hashtag5 and hashtag6.This dataset has been uploaded to GEO (see the Data availability statement for details).

R6
This dataset has been previously published [26].In this dataset, mRNA, BCR repertoire, and predicted binding specificity against 17 antigen probes, including Spike, NP, ORF8, and RBD proteins of the endemic and pandemic COVID strains, HA proteins of the inf luenza virus and interferon alpha and omega, were measured.Cells from two time points of an individual human donor were uniquely labeled by two cell hashtags: hashtag3-early and hashtag4-late.This dataset (R6) is available from Mendeley Data: https://doi.org/10.17632/3jdywv5jrv.3.

9pool-CA
A novel single-cell dataset was generated for this paper (CA is short for carrier).In this dataset, we sorted B cells and T cells from nine subjects and pooled them together.mRNA and surface protein expression panels were measured.Cells from nine individual human donors were uniquely labeled by nine cell hashtags: hashtag1-hashtag9.This dataset has been uploaded to GEO (see the Data availability statement for details).

9pool-AS
A novel single-cell dataset was generated for this paper (AS is short for antigen-specific).In this dataset, we sorted antigenspecific B cells from nine human donors (as in dataset 8pool) and pooled them together.mRNA, BCR repertoire, surface protein expression (CD27 and CD79b), and predicted binding specificity against 18 antigen probes, including HA proteins of several endemic inf luenza strains, were measured.This dataset has been uploaded to GEO (see the Data availability statement for details).

8pool-AS
A novel single-cell dataset was generated for this paper.In this dataset, we sorted antigen-specific B cells from eight subjects and pooled them together.mRNA, BCR repertoire, surface protein expression (CD27 and CD79b), and predicted binding specificity against several antigen probes, including HA proteins of several endemic inf luenza strains, were measured.This dataset has been uploaded to GEO (see the Data availability statement for details).

8pool-CA
A novel single-cell dataset was generated for this paper.In this dataset, we sorted B cells and T cells from eight subjects (as in Dataset 8) and pooled them together.mRNA and surface protein expression panels were measured.Cells from eight individual human donors were uniquely labeled by eight cell hashtags: hashtag1-hashtag8.This dataset has been uploaded to GEO (see the Data availability statement for details).

Cell hashing demultiplexing using HTOreader
We developed and introduced an improved demultiplexing approach for single-cell cell hashing, called HTOreader.To accurately determine the hashtag identity for each individual cell, we developed a cutoff calling method that precisely distinguishes true positive from background.Specifically, the distributions of normalized counts for each hashtag were first fitted into two Gaussian distributions, representing background and true positive groups.Then a cutoff value that distinguishes the two groups was calculated based on the means and standard divisions of these two Gaussian distributions.Finally, the identity of each individual cell was determined according to the hashtags they were positive for.

Data normalization
Two normalization methods-CLR and Log (log1p) normalization-are available.CLR method is more common in the normalization of CITE-seq protein expression and hashtags [22,27].For a given raw count vector W of a hashtag, the CLR normalization will be: where n is the length of the vector W and g(w) denotes the geometric mean of W.
The conventional log normalization also works well in some datasets.We use log 1p function (log 1p(N) = log (N + 1)) to avoid the undefined log(0).For a given raw counts vector W of a hashtag, the log normalization will be: In datasets that incorporated a large number of hashtags in a single experiment (e.g.N > 4), we noted that for each hashtag signal, the "negative" peak significantly outweighs the "positive" peak.This is because the true positive cells for the current signal typically constitute 1/N of the entire population.Consequently, the two normal distributions fitted on the original data may be inaccurate.To address this, we perform a sampling for cells with low signals prior to model fitting when hashtag number N > 4.During the sampling phase, HTOreader employs an empirical cutoff to approximately identify low signals (1.5 for CLR and 3 for the log normalization method by default).We then arrange all cells with low signals in ascending order and select every N/2 cells (if N/2 is not an integer, use the f loor value).Let NS denote a set of normalized signals and the negative signals and positive signals are two subsets of NS: where c is the cutoff.Then, we sort the NS neg in ascending order as NS neg .Then, the sampled subset: where θ = f loor (N/2) and x ∈ NS neg .The reshaped set will be: Following this, we amalgamate the low signal sampling with all high signals as the reshaped data for the subsequent model fitting.

Mixture modeling
Mixture modeling has been extensively used in single-cell data pre-processing, such as to estimate drop-out rates or determine effective sequencing depth and amplification noise [28,29].We adopted a mixture modeling approach implemented in the Flexmix package to fit two Gaussian distributions from a vector of normalized hashtag counts [30].In this step, we fit the normalized data of each hashtag into two Gaussian distributions indicating one positive group, representing background and true positive groups, and calculate the means and standard deviations of these two groups, respectively.

Cutoff determination
For two Gaussian distributions N μ 1 , σ 2 1 and N μ 2 , σ 2 2 , μ 1 < μ 2 , we determine the cutoff to distinguish true positive and background using the following equation: where n is the rank of the model and the recommended rank is 2 in most cases.To mitigate the risk of inaccurate cutoff calls, we introduced an additional validation step in HTOreader.
If the selected threshold falls below a predefined empirical limit (defaulting to 1.5 for CLR and 3 for the log normalization method), it is automatically adjusted to the global minimum value based on the linear approximated density of the normalized cell hashing signal between the means of the two fitted distributions.

Sample identity assignment using cell hashing
We assign cells' sample identities based on the binding status of each hashtag [22].If a cell is deemed positive for only one hashtag, it will be labeled as a singlet for that corresponding hashtag; if it's deemed positive for multiple hashtags, it will be labeled as a doublet; if it's deemed background for all hashtags, it will be labeled as negative.Sample identities for every cell labeled as singlet will be assigned according to their hashtag identities.

Genomic signature demultiplexing
Genomic signature demultiplexing is an essential part of this hybrid strategy, and to date, many computational methods are available, for example, demuxlet and Souporcell.In this paper, we applied Souporcell, which has been widely used in the community, as part of our strategy to demonstrate the effectiveness of this workf low.As most SNP-based demultiplexing methods do, Souporcell aligns all short reads against a reference genome to determine SNPs for each cell and then groups cells into multiple clusters according to their genotypes (SNP signatures) using an unsupervised learning algorithm.The number of genotype clusters is pre-defined by users according to the number of subjects in the pooled sample.For a sample pooled from N subjects (individual human donors), there will be N + 2 distinct genotype clusters identified: one 'doublet' cluster containing cells that fit with more than one genotypes and one 'negative' cluster containing cells whose SNP signatures are not sufficient and therefore cannot be fit into any genotype, and N singlet clusters indicating cells from N individual donors.

Sample identity assignment in hybrid demultiplexing
First, we correlate results from both cell hashing demultiplexing and SNP-based demultiplexing to reveal the sample identity of genotype clusters.For each genotype group, the highestcorrelated cell hashing group represents its sample identity (Supplementary Fig. S3A).Then, focusing on cells that are assigned as singlets by both methods (highlighted by red border in Supplementary Fig. S3A), we define the number of cells that received consistent assignment from both methods as N1 (highlighted by green background) and the number of cells that received inconsistent assignment from both methods as N2 (highlighted by orange background).So the convergence score is defined as: If the convergence score exceeds a pre-defined threshold (a number between 0 and 1, with a default value of 0.7), as seen in 8pool-CA (convergence score = 0.99), we deem the experiment quality to be satisfactory.Subsequently, sample labels were assigned to all cells based on their assignments from both methods (Supplementary Fig. S3B).Cells receiving consistent labels from both methods were assigned the consistent label as their sample label.For cells receiving inconsistent labels, there're three cases: • Case 1: cells that are assigned as 'doublet' or "negative" by cell hashing methods but are assigned as "singlet" by the SNP-based method.Labels of these cells will be assigned as their genotype labels because they are "false doublet" or "false negative" cells due to the hashtag contamination or failure.• Case 2: cells that are assigned as "singlet" by cell hashing methods but are assigned as "doublet" or "unassigned" by the SNP-based method.Labels of these cells will be assigned as "unassigned" since current data are insufficient to infer their sample identification with high confidence.• Case 3: cells were assigned as singlet by both methods; however, the sample identities assigned by the two methods are inconsistent.Labels of these cells will be assigned as "unassigned" since current data is insufficient to infer their sample identification with high confidence.
If the convergence score falls below the threshold, as observed in 9pool-CA (score = 0.1 and 0.55 for Supplementary Tables S14  and S15, respectively), it suggests a potential high doublet rate or disrupted genetic background among samples in the current dataset.Cell labels will then be assigned based on their cell hashing labels.Additionally, a warning message will be displayed to alert the user that the demultiplexing results may not be accurate and that the data quality requires the user's attention and biological knowledge.

Cell hashing demultiplexing comparison
We conducted demultiplexing using popular existing methods, including MULTI-seq, GMM-Demux, DropletUtils, BFF_raw, and BFF_cluster, utilizing the R package cellhashR (version 1.0.3) in R 4.2.2.Additionally, we ran HTOreader using the R package HTOreader (version 0.1.0)and HTOdemux using the R package Seurat (version 4.3.0.1) in R 4.2.2.For cellhashR and HTOreader, all parameters were set to default.For Seurat, the positive quantile was set to 0.99, with the remaining parameters set to their default values.
We used the following measurements to evaluate the performance of all tested methods: singlet rate, accuracy, recall, precision, and F1-score.The singlet rate is defined as the percentage of identified singlets in all cells.For benchmark dataset Stoeckius-2018, we utilized ground truth labels generated using a SNPbased demultiplexing method, scSplit (see Supplementary Data for details).For in-house datasets 8pool-CA and R125, we generated ground truth labels using Souporcell.We classify singlets correctly identified as singlets as true positives (TP), doublets/negative cells accurately identified as doublets/negative cells as true negatives (TN), doublets/negative cells misclassified as singlets as false positives (FP) and singlets erroneously labeled as doublets/negative as false negatives (FN).Then, we define:

Flow cytometry staining and cell sorting
Flow staining and cell sorting were performed as previously described [26].Brief ly, human PBMCs were thawed in 10% FBS RPMI-1640 medium and enriched by negative selection using a pan-B cell isolation kit according the manufacturer's instruction (StemCell, Cat#.19554) prior to staining with the following antibodies and f luorescent oligonucleotide-labeled streptavidinantigen tetramers (Biolegend): anti-huCD19-PE-Cy7, anti-huCD3-BB515, anti-huCD4-BB515, anti-huIgD-BB515, TotalSeq-C antihuman hashtag antibodies, antigen-PE or-APC and at 4 • for 30 min.Cells were subsequently washed three times with 2% FBS/PBS buffer supplemented with 2 mM D-biotin.Finally, cells were adjusted at a maximum of 2 million cells per ml in washing buffer, stained with DAPI and subjected to sorting by either MACSQuantTyto (Miltenyi) or BD Melody (BD).Cells that were viable/CD19 + /antigen-PE + and antigen-APC + or viable/CD4 + were sorted for downstream 10X Genomics processing.

10X genomics libraries construction and next generation sequencing
5 gene expression, VDJ, and surface protein feature libraries were prepared using the 10X genomics platform as per the manufacturer's instructions [Chromium Next GEM Single Cell 5 (HT) Reagent Kits v2 (Dual Index)].Three libraries were quantified by real-time quantitative PCR using KAPA Library Quantification Kits (Roche) and pooled at recommended ratio and sequenced using NextSeq1000 (Illumina) with 26 cycles for read 1, 10 cycles for i7/i5 index and 150 cycles for read 2.

Overview of the hybrid demultiplexing strategy
The hybrid demultiplexing strategy addresses challenges in cell hashing experiments and existing demultiplexing methods, aiming to deliver accurate results across diverse experimental configurations.It incorporates two parallel demultiplexing solutions, cell-hashing-based demultiplexing and SNP-based demultiplexing, enhancing performance in a multi-modal fashion (Fig. 1A).Notably, the cell-hashing-based solution is sensitive to staining quality and experimental configurations, while the SNP-based solution consistently achieves high accuracy but lacks access to sample identity without additional experimental efforts.The hybrid strategy integrates results from both solutions, as illustrated in Figure 1A, allows mutual validation and reveals cell identities in an accurate and reliable manner.
The workf low for hybrid demultiplexing takes cell hashtag feature counts and raw sequencing reads as inputs for hashing-and SNP-based solutions, respectively.Downstream, pre-processing occurs in the form of HTOreader, an improved cut-off calling algorithm for cell hashing distributions, and Souporcell, an unsupervised clustering algorithm based on SNP variants (Fig. 1B) [10].We developed HTOreader to consistently generate accurate hashtag threshold results for a variety of real-world datasets through a streamlined pipeline consisting of four steps (Fig. 1B).First, the raw counts of each individual hashtag are normalized.Next, the normalized counts are fit into two Gaussian distributions via a mixture model, representing background and true positive groups.Importantly, for datasets with more than four hashtags, the normalized data will be sampled before model fitting to avoid the dominance of negative signals.Subsequently, the optimal cutoff of each hashtag that distinguishes the two inferred distributions is determined.Finally, the sample identity of each cell is assigned based on its hashtag profile (Fig. 1B).For SNP-based demultiplexing, we employed Souporcell, which is comprised of three major steps: feature (genetic variant; SNP) selection, counting cell alleles and genotype-based clustering [10].As the final step, the hybrid strategy correlates demultiplexing results from both methods and then reveals the sample identities for each cell.Singlet groups from the two methods are combined to unveil the sample identity of genotype clusters.Additionally, a convergence score was generated to assess the demultiplexing quality and identify potential errors.Subsequently, sample labels were assigned to all cells based on their assignments from both methods (see Methods for details).
Importantly, the hybrid strategy is f lexible with existing demultiplexing methods, enabling users to choose preferred algorithms for both cell hashing and genetic variants (Fig. 1A).However, it requires diverse genetic backgrounds for accurate genotype cluster generation in pooled samples, making it unsuitable for experiments that share the same genetic background (e.g.genetically homogenous animal models).Careful consideration of sample pooling and demultiplexing solutions during experimental design ensures accuracy and maximizes performance.A tree diagram is provided to guide users in selecting demultiplexing solutions based on their experimental configurations (Supplementary Fig. S1).

HTOreader, a demultiplexing tool to improve hashtag threshold determination in real-world datasets
As mentioned above, current demultiplexing methods for cell hashing can perform poorly in real-world datasets with many hashtags, imbalanced sample sizes or highly variable hashtag distributions.Here, we developed HTOreader, aiming to improve the accuracy and singlet rate of cell-hashing-based demultiplexing in various real-world scenarios (Fig. 2A).Unlike HTOdemux and MULTI_seq, which utilize heuristic algorithms, HTOreader independently handles each HTO to mitigate interference from imbalanced sample sizes.It aligns with existing methods in determining cutoffs from a bimodal distribution but employs a unique approach by determining an optimal cutoff using the global minimum value on the linear approximated density of normalized HTO counts between positive and negative groups, distinguishing it from other existing methods.
Below, we compare the performances between HTOreader and the established method, including MULTI-seq, HTOdemux, GMM-Demux, DropletUtils, and BFF, on both the benchmark (Stoeckius-2018) and several real-world datasets (3V007, S414, 8pool-CA, and R125; generated in-house).Among the five datasets analyzed, three (Stoeckius-2018, 8pool-CA and R125) possess benchmark labels derived from SNP-based methods.The benchmark labels for Stoeckius-2018 were obtained from a previous study and generated by scSplit, while those for 8pool-CA and R125 were produced using Souporcell (details in Supplementary Data) [10,16].These representative datasets cover a variety of experimental   The Stoeckius-2018 dataset, featuring eight hashtags and balanced sample sizes, serves as a benchmark dataset with available ground truth labels [ 8,9].As expected, all tested methods exhibited comparable singlet rates, ranging from 80% to 83% on Stoeckius-2018 (Supplementary Table S1).Upon validation with the ground truth labels, all methods delivered high performance in terms of accuracy, recall and F1 score (Supplementary Table S2).The performance of tested methods varies considerably when applied to real-world datasets.3V007 and S414 are both balanced datasets with cells from single donors labeled with two and four hashtags, respectively (Fig. 2B and C).For these two datasets, most methods achieved over 85% singlet rate, except for HTOdemux, which only reached 80% (Fig. 2B and C, Supplementary Tables S3 and S4).Interestingly, DropletUtils generated inconsistent results for 3V007 when compared with other methods and failed to identify any doublets, suggesting low accuracy for this program (Supplementary Table S3).In S414, antigen-specific B cells were labeled with hashtags 5 and 6, while carrier B and T cells were labeled with hashtags 3 and 4. We highlighted cells identified by HTOreader for each hashtag on a UMAP and found that the demultiplexing result is consistent with experimental setup (Fig. 2C).The 8pool-CA dataset comprises cells from eight human donors labeled with eight hashtags, with very few cells from donor S282 ( Fig. 2D).We observed significantly weaker peak separations in the hashtag data from 8pool-CA compared to the benchmark Stoeckius-2018, regardless of the normalization methods (Supplementary Fig. S2B and C).For 8pool-CA, two BFF models (BFF_raw and BFF_cluster) failed to identify any cell from sample S282, while the other methods produced reasonable singlet rates, ranging from 77.62% to 83.99% (Supplementary Table S5).Validation against the ground truth showed that all methods achieved high precision (>98.0%),except for BFF_cluster (90.6%) due to the failure in identifying S282 (Supplementary Table S6).All other methods also achieved high accuracy, recall and F1 scores (Supplementary Table S6, Fig. 2D).R125 comprises cells from three human donors and exhibits poor hashtag staining quality in plasmablasts, which express a disproportionally high level of immunoglobulin genes (Fig. 2E).As a result, the performance of all tested methods declined for R125.In particular, singlet rates calculated by MULTI_seq (49.7%) and HTOdemux (62.3%) were significantly lower than other methods due to the incorrect assignment of a large fraction of cells into the negative group.Interestingly, BFF_cluster generated the highest singlet rate (94.5%) (Supplementary Table S7) for R125 and outperformed all other methods when validating against ground truth labels (Fig. 2E, Supplementary Table S8).
We also investigated the performance of current methods with various parameter configurations, concentrating on those with adjustable parameters.The results indicated that HTOdemux and BFF_cluster reached optimal performance using the recommended parameter settings on most datasets.In contrast, the optimal parameters for MULTI_seq varied across different datasets, yet they were generally close to the default values.Additionally, even when the recommended parameters were not ideal, the difference in performance was marginal.It is important to note that BFF_cluster was unable to identify sample S282 in the 8pool-CA dataset, and MULTI_seq encountered difficulties with dataset R125, regardless of the parameter settings employed (see Method for details).
In conclusion, our results show that despite the fact that most methods perform reasonably well on the datasets we tested, it is important to highlight that some existing methods suffer from certain issues and exhibit suboptimal performance when demultiplexing specific datasets.These issues include the inability to identify hashtags, incorrect doublet identification and low singlet rates, despite attempts at parameter tuning (Table 1).Notably, HTOreader demonstrates high accuracy and robustness across various benchmark and real-world datasets (Table 1), consistently delivering reliable results in cell hashing experiments.

Hybrid demultiplexing improves cell recovery and cost-effectiveness for cell hashing
The problem of cell loss stemming from hashtag failures and false doublets cannot be addressed solely by enhancing cell hashing signal demultiplexing and warrants further investigation.To address these issues, we exploit the recent advances in SNP-based demultiplexing methods that cluster single cells based on their SNPs.Here, we propose a hybrid demultiplexing strategy that integrates demultiplexing results from cell hashing and genetic variant profiles (Fig. 1A).Using this hybrid strategy, sample identities of single cells that are poorly stained (negative) or contaminated by multiple hashtags (doublet) can be determined together with those singlets from the same genotype cluster.Notably, this hybrid strategy is compatible with existing cell-hashing-based (GMM-Demux, MULTI_seq, HTODemux, DropletUtils and BFF) and SNP-based (scSplit, Vireo and Souporcell) demultiplexing methods (Fig. 1A).In the following analysis, we use HTOreader and Souporcell as a representative combination.
To demonstrate its performance, we applied this hybrid strategy to a real-world dataset (8pool, Fig. 3A).In this dataset, a small aliquot of PBMCs from each of eight donors were stained with hashtag antibodies individually, and subsequently pooled together to sort carrier CD19+ B cells and CD4+ T cells (8pool-CA, Fig. 3C).Thereafter, all remaining PBMCs were pooled together to sort antigen-specific B cells (8pool-AS, Fig. 3B).We ran cells from 8pool-CA and 8pool-AS combined through Souporcell, and cells from 8pool-CA through HTOreader.Results showed that all hashtags were stained with high quality, and HTOreader divided all cells in 8pool-CA into 10 groups, including 10 821 cells from 8 singlet groups (84.0%), 1893 apparent doublets (14.7%) and 170 negative cells (1.3%) (Supplementary Table S5).In comparison, Souporcell divided all cells in 8pool into 10 genotype clusters.Among 12 884 cells from 8pool-CA, 11 550 cells were categorized into 8 singlet clusters (89.7%), 1326 cells into a doublet cluster (10.29%) and 8 cells into an unassigned cluster (0.06%) (Fig. 3D, Table 2).Using our hybrid strategy, the sample identities of eight singlet genotype clusters can be easily obtained by correlation with cell-hashing-based results (Table 2).97.9% (10 597 from 10 821) of singlets identified by HTOreader were also registered as singlets in Souporcell, validating the accuracy of both methods (Fig. 3D).Within the extra singlets recovered by hybrid demultiplexing (5.2%), a small fraction (166 cells) was identified as negative, and the majority was identified as doublet (715 cells) by cell hashing, suggesting a high false doublet rate when using many hashtags (Fig. 3B, Table 2).
In addition to improving cell recovery and calling accuracy, this hybrid strategy can also decrease the experimental cost  when processing samples with large numbers of cells.As we mentioned, cells from 8pool-AS and 8pool-CA shared the same donors and SNP profiles, so we ran Souporcell on all cells together to demultiplex cells from 8pool-AS that were not labeled with hashtags.Results showed that we identified 2285 cells in eight singlet clusters (82.67%), along with 416 apparent doublets (15.05%) and 63 negative cells (2.28%) (Table 2).Our hybrid strategy achieved an overall cell recovery rate of 88.03% (13 777 out of 15 650) with all cells from 8pool-AS and 8pool-CA combined (Fig. 3E and F, Table 2).Therefore, when processing samples in large scale using this hybrid strategy, only a small aliquot of cells from each donor (e.g. 100 cells per sample is sufficient for hybrid demultiplexing in practice) needs to be stained with hashtags to link donor identity with SNP clusters, and all cells can be efficiently demultiplexed through SNP-based clustering.This unique feature not only enhances experimental f lexibility but also reduces the consumption of hashtag antibodies per sample without sacrificing performance.Together, compared to demultiplexing with cell hashing alone, our hybrid strategy demonstrates enhancements in both cell recovery rate and cost-effectiveness.

Hybrid demultiplexing is more robust and reliable than unimodal methods
Another important strength of the hybrid strategy is its ability to deliver accurate results, largely independent of the hashtag staining quality or performance of cell hashing-based demultiplexing methods.For instance, regardless of the demultiplexing methods used in 8pool-CA, the hybrid strategy consistently correlated cell hashing singlet groups with genotype clusters and accurately determined the sample identity of almost 90% of cells (Table 2).In an extreme case where two BFF models failed to identify cells from sample S282, the hybrid strategy successfully correlated seven cell hashing groups identified by the BFF models with genotype clusters and revealed that a cluster of singlets represented cells from sample S282, as they were not labeled by any hashtags (Supplementary Tables S9 and S10).Therefore, the hybrid strategy permits up to one unlabeled sample [among N samples, (N − 1) being labeled with hashtags], either due to experimental design or hashtag failure.Unsurprisingly, for the other three methods that generated high quality results, our hybrid strategy also seamlessly correlated cell hashing singlet groups with genotype clusters (Supplementary Tables S11-S13).
Although SNP-based methods are known to be highly accurate, we observed that Souporcell can generate incorrect genotype clusters in datasets with high true-doublet rates (Fig. 4A).When we applied our hybrid strategy to dataset 9pool-CA with T and B cells from nine donors, we observed a poor correlation (convergence score = 0.1) in singlets between HTOreader and Souporcell (Fig. 4D and E, Supplementary Table S14).With further analysis, we identified transcriptional clusters consisting of apparent true cell doublets expressing both B cell (CD19, MS4A1) and T cell (CD3E, IL7R) markers (Fig. 4B).Besides these B&T doublets, cell hashing signals revealed that there were more doublets in B cell and T cell clusters (Fig. 4C).Therefore, we suspected that the unexpectedly high proportion of true doublets in this dataset might interfere with the unsupervised clustering process of SNP-based methods.To test our hypothesis, we removed cell clusters expressing both B and T cell markers and observed an increased correlation (convergence score = 0.55, Fig. 4F, Supplementary Table S15).Then, we further removed all doublets assigned by cell hashing (Fig. 4D) and found that the convergence score reached 0.95 and demultiplexing results went back to normal (Fig. 4G, Supplementary Table S16).These results highlight the risk associated with SNP-based methods that use unsupervised clustering algorithms.
In conclusion, our results from dataset 8pool-CA show that the hybrid strategy leads to an enhanced singlet rate and accuracy compared to demultiplexing with cell hashing alone.Additionally, results from dataset 9pool-CA indicate that the hybrid strategy can identify incorrect genotype clustering generated by SNP-based methods, thereby highlighting potential experimental f laws.By integrating results from cell-hashing-based and SNPbased methods, the hybrid strategy allows the two methods to validate results from each other and correct potential errors, making it more robust and reliable than unimodal methods.

Discussion
Over the last few years, sample super loading and demultiplexing strategies have been extensively studied to improve the quality, magnitude, and economy of single-cell experiments.Among them, cell hashing that derives from CITE-seq technology has been one of the most popular demultiplexing approaches due to its simplicity and compatibility with common sequencing platforms.However, the current cell hashing technique still has several disadvantages, including difficulties in distinguishing true positive from a background signal, unpredictable hashtag failure, high false doublet rates when using many hashtags and high reagent costs when staining large numbers of cells.Here, we developed an R package that includes an improved cell hashing demultiplexing tool, HTOreader and utilizes a hybrid demultiplexing strategy to integrate the results of cell hashing and SNP clustering, aiming to address the issues above.
One major caveat of cell hashing is the proper cutoff calling for cell hashing signals.Among existing cell-hashing demultiplexing methods, the mathematical models underlying different methods are based on various assumptions about the distribution of cell-hashing signals.When the actual distribution significantly deviates from these assumptions, the methods may fail, leading to serious consequences, such as misdetection of hashtags or inf lated negative/doublet rate.Our findings indicate that existing methods faltered on certain datasets, and this was not rectifiable by adjusting parameters.Despite attempts to optimize performance through parameter tuning on multiple datasets, our results suggested that while tuning parameters can marginally improve performance, it is not sufficient to overcome failures on challenging datasets, which are likely due to significant deviations from the assumptions made by the mathematical models.
Another major challenge is that the cell recovery rate decreases as the number of hashtags increases, due to the accumulation of false doublets.In addition, the poor staining quality of certain hashtags that occur sporadically can further decrease cell recovery by increasing the negative rate.The SNP-based demultiplexing method is highly precise and capable of recovering up to 90% of cells.However, its utility is constrained by the need for distinct genetic backgrounds among samples and the additional expense of creating SNP references.Consequently, we developed a hybrid demultiplexing strategy that combines cell-hashing-based methods with SNP-based methods, allowing the strengths of each method to compensate for the other's weaknesses and yield the most precise results.Our results showed that this hybrid strategy consistently increases cell recovery to ∼90%, regardless of hashtag numbers or staining quality.This approach is extremely useful to simultaneously demultiplex multiple pooled samples from the same group of donors (such as longitudinal studies); only cells from one timepoint need to be hashtagged to obtain donor identities for cells from all timepoints (e.g.9pool-CA + 9pool-AS, 8pool-CA + 8pool-AS).Additionally, our findings indicate that the hybrid strategy effectively addresses common errors in cell hashing demultiplexing, such as misidentifying hashtags and low singlet rates.Conversely, it also sensitively detects f laws in SNP-based demultiplexing, especially when cells lack genetic distinctiveness, thus providing a warning of potential data quality issues.Thus, in contrast to unimodal approaches, multimodal methods hold promise for yielding more precise and reliable demultiplexing results [ 20].To conclude, our hybrid strategy provides a solution to improve the performance, robustness, and cost-effectiveness of single-modal cell hashing-and SNP-based methods.
Despite the advantages offered by the hybrid demultiplexing strategy, it still has several limitations.First, SNP-based demultiplexing only works with samples that have a distinct genetic background.As a result, it is not applicable in most animal studies or longitudinal studies with samples from the same donor.Additionally, the performance of the hybrid strategy is largely dependent on the quality of SNP-based clustering.Despite being more robust than cell-hashing-based methods, SNP-based clustering can still produce low-quality results due to multiple reasons, including cell populations with abnormal gene expression profiles (e.g.plasmablasts or dying cells), high true doublet rates, and poor transcriptome quality (e.g.insufficient reads or ambient RNA contamination).Thus, future efforts to improve the robustness of SNP-based clustering are necessary, which in turn will also enhance the performance of hybrid demultiplexing.Theoretically, this hybrid strategy is compatible with various barcode-based methods besides cell hashing; we plan to explore this potential by adapting it to other demultiplexing methods and examining performance using real-world datasets in future studies.

Key Points
• A hybrid demultiplexing strategy (cell hashing and genetic correlation) increases both cell recovery rate and calling accuracy.• HTOreader accurately distinguishes true positives from background signal in datasets with (1) many hashtags or (2) imbalanced sample sizes.• The hybrid approach enhances cost-effectiveness of cell hashing and consistently produces reliable demultiplexing results, regardless of hashtag staining quality.• The hybrid strategy is compatible with existing singlecell experimental protocols and computational analysis tools.

Figure 1 .
Figure 1.The architecture of hybrid demultiplexing strategy workf low for cell hashing data.(A) Graphical demonstration of the principle of hybrid demultiplexing.(B) Workf low and crucial components of hybrid demultiplexing strategy in conventional single-cell RNA-seq analysis.HTOreader was developed for cell-hashing-based demultiplexing and an existing tool Souporcell is the default SNP-based demultiplexing tool.This workf low is also compatible with other existing methods as demonstrated in the figure.Figure is created with BioRender.com.
Figure 1.The architecture of hybrid demultiplexing strategy workf low for cell hashing data.(A) Graphical demonstration of the principle of hybrid demultiplexing.(B) Workf low and crucial components of hybrid demultiplexing strategy in conventional single-cell RNA-seq analysis.HTOreader was developed for cell-hashing-based demultiplexing and an existing tool Souporcell is the default SNP-based demultiplexing tool.This workf low is also compatible with other existing methods as demonstrated in the figure.Figure is created with BioRender.com.

Figure 2 .
Figure 2. HTOreader achieves high accuracy and increases cell recovery and robustness across real-world datasets.(A) The workf low of HTOreader to determine proper cutoff for each cell hashtag.(B) Demultiplexing on 3V007.Left: expression heatmap of both hashtags in 3V007.Center: violin plot of expression of both hashtags on HTOreader groups.Right: Singlet rates of all seven tested methods on 3V007.(C) Demultiplexing on S414.Left: expression heatmap of all hashtags in S414; center: singlet rates of all seven tested methods on S414; right: cells on a UMAP plot grouped by HTOreader demultiplexing and colored by cell types.(D) Demultiplexing on 8pool-CA.Left: expression of eight cell hashtags on dataset 8pool-CA.Singlet, doublet, and negative groups are indicated at the top of the heatmap.Right: performance of seven tested methods on 8pool-CA.(E) Demultiplexing on R125.Left: expression of three hashtags on the UMAP embedding of dataset4.Center: expression heatmap of all hashtags.Right: Performance of seven tested methods on R125.

Figure 3 .
Figure 3.The hybrid demultiplexing strategy: enhancing cell recovery, robustness, and cost-effectiveness in a real-world immune cell dataset.(A) Depicts configurations and hybrid demultiplexing applied to datasets 8pool-CA and 8pool-AS.(B) UMAP visualization of expression of CD27 protein and BACH2 gene, somatic hypermutation (SHM) number of BCR heavy chain, and isotype of IG genes of all antigen-specific B cells (8pool-AS).(C) Expression of B cell (CD19 and MS4A1) and T cell (CD3E and IL7R) marker genes in carrier cells (dataset 8pool-CA).(D) A heatmap of correlation between cell hashing demultiplexing and SNP-based demultiplexing across all cells in dataset 8pool-CA.True doublets and false doublets were highlighted in thick black border, while genotype cluster-cell hashing pairs were highlighted in thick border.(E) Cells demultiplexed by this hybrid strategy of dataset 8pool-AS and dataset 8pool-CA are visualized on an integrated UMAP individually.(F) Demultiplexing results of different methods across both 8pool-CA and 8pool-AS datasets.Two BFF models were highlighted by red and dashed borders due to the failure of identifying sample S282 in dataset 8pool-CA.Panel A is created with BioRender.com.

Figure 4 .
Figure 4. Hybrid demultiplexing strategy corrects potential error of SNP-based method.(A) Expression of nine cell hashtags on dataset 9pool-CA.Singlet, doublet, and negative groups are indicated on the top of the heatmap.(B) Expression of two B cell gene markers (CD19 and MS4A1) and two T cell gene markers (CD3E and IL7R) visualized on a UMAP embedding of carrier cells of dataset 9pool-CA.(C) Doublets, negative cells and singlets identified by cell hashing method are visualized on a UMAP individually.Cells are colored by transcriptome clusters.(D) Expression of nine hashtags on genotype clusters generated by SNP-based method.(E) A heatmap of correlation between cell hashing demultiplexing and SNP-based demultiplexing on all cells in dataset 9pool-CA.(F) A heatmap of correlation between cell hashing demultiplexing and SNP-based demultiplexing on all cells of dataset 9pool-CA after removing B&T doublets (cells that express both B and T cell markers).(G) A heatmap of correlation between cell hashing demultiplexing and SNP-based demultiplexing on all cells of dataset 9pool-CA after all potential doublets (cells that express more than one hashtag).

Table 2 .
Hybrid demultiplexing on dataset 8pool-CA and 8pool-AS.SNP demultiplexing method is Souporcell, cell hashing demultiplexing method is HTOreader